Noise robust digit recognition using sparse representations

نویسندگان

  • J. F. Gemmeke
  • B. Cranen
چکیده

Despite the use o f noise robustness techniques, automatic speech recognition (ASR) systems make many more recognition errors than humans, especially in very noisy circumstances. We argue that this inferior recognition performance is largely due to the fact that in ASR speech is typically processed on a frameby-frame basis preventing the redundancy in the speech signal to be optimally exploited. We present a novel non-parametric classification method that can handle missing data while simul­ taneously exploiting the dependencies between the reliable fea­ tures in an entire word. We compare the new method with a state-of-the-art HMM-based speech decoder in which missing data are imputed on a frame-by-frame basis. Both methods are tested on a single digit recognition task (based on AURORA-2 data) using an oracle and an estimated harmonicity mask. We show that at an SNR of -5 dB using the reliable features o f an entire word allows an accuracy o f 91% (using mel-log-energy features in combination with an oracle mask), while a conven­ tional frame-based approach achieves only 61%. Results ob­ tained with the harmonicity mask suggest that this specific mask estimation technique is simply unable to deliver sufficient reli­ able features for acceptable recognition rates at these low SNRs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Speech and Bird Song Processing using Multi-band Correlograms and Sparse Representations

of the Dissertation Robust Speech and Bird Song Processing using Multi-band Correlograms and Sparse Representations by Lee Ngee Tan Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2014 Professor Abeer Alwan, Chair This dissertation focuses on algorithms for robust speech and bird song processing. Many applications perform well under ideal signal conditions,...

متن کامل

Face Recognition in Thermal Images based on Sparse Classifier

Despite recent advances in face recognition systems, they suffer from serious problems because of the extensive types of changes in human face (changes like light, glasses, head tilt, different emotional modes). Each one of these factors can significantly reduce the face recognition accuracy. Several methods have been proposed by researchers to overcome these problems. Nonetheless, in recent ye...

متن کامل

Robust Palmprint Recognition Based on Directional Representations

In this paper, we consider the common problem of automatically recognizing palmprint with varying illumination and image noise. Gabor wavelets can be well represented for biometric image for their similar characteristics to human visual system. However, these Gabor-based algorithms are not robust for image recognition under non-uniform illumination and noise corruption. To improve the recogniti...

متن کامل

Face Recognition using an Affine Sparse Coding approach

Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...

متن کامل

TR01: Time-continuous Sparse Imputation

An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing) prior to decoding, and to replace the missing ones by clean speech estimates. We present a novel method to obtain such clean speech estimates. Unlike previous imputation frameworks which work on a frame-by-frame basis, our method focuses ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008